Unsupervised morphological segmentation and clustering with document boundaries

نویسندگان

  • Taesun Moon
  • Katrin Erk
  • Jason Baldridge
چکیده

Many approaches to unsupervised morphology acquisition incorporate the frequency of character sequences with respect to each other to identify word stems and affixes. This typically involves heuristic search procedures and calibrating multiple arbitrary thresholds. We present a simple approach that uses no thresholds other than those involved in standard application of χ2 significance testing. A key part of our approach is using document boundaries to constrain generation of candidate stems and affixes and clustering morphological variants of a given word stem. We evaluate our model on English and the Mayan language Uspanteko; it compares favorably to two benchmark systems which use considerably more complex strategies and rely more on experimentally chosen threshold values.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Extraction and 3D Segmentation of Tumors-Based Unsupervised Clustering Techniques in Medical Images

Introduction The diagnosis and separation of cancerous tumors in medical images require accuracy, experience, and time, and it has always posed itself as a major challenge to the radiologists and physicians. Materials and Methods We Received 290 medical images composed of 120 mammographic images, LJPEG format, scanned in gray-scale with 50 microns size, 110 MRI images including of T1-Wighted, T...

متن کامل

Tree Structured Dirichlet Processes for Hierarchical Morphological Segmentation

This article presents a probabilistic hierarchical clustering model for morphological segmentation. In contrast to existing approaches to morphology learning, our method allows learning hierarchical organization of word morphology as a collection of tree structured paradigms. The model is fully unsupervised and based on the hierarchical Dirichlet process (HDP). Tree hierarchies are learned alon...

متن کامل

Color image segmentation using morphological clustering and fusion with automatic scale selection

Please cite this article in press as: Lézoray, Recognition Lett. (2008), doi:10.1016/j.pat In this paper, a color image segmentation method considering pairwise color projections is proposed. Each pairwise projection is analyzed according to an unsupervised morphological clustering which looks for the dominant colors of a 2D histogram. This leads to obtaining three segmentation maps combined by...

متن کامل

Combining shape, texture and intensity features for cell nuclei extraction in Pap smear images

In this work, we present an automated method for the detection and boundary determination of cells nuclei in conventional Pap stained cervical smear images. The detection of the candidate nuclei areas is based on a morphological image reconstruction process and the segmentation of the nuclei boundaries is accomplished with the application of the watershed transform in the morphological color gr...

متن کامل

An unsupervised color image segmentation based on morphological 2D clustering and fusion

A segmentation method for color images is presented in this work. A morphological unsupervised 2D multiband histogram clustering provides an initial coarse segmntation of the image. Region information is then used and a novel technique is introduced to simplify the Region Adjacency Graph by merging candidate regions until the stabilization of a segmentation criterion. Merged regions are refined...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2009